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Entropies must correspond to mean values for them to be measurable. The Shannon entropy 
corresponds to the weighted arithmetic mean, whereas the Renyi entropy corresponds to the expo- 
nential mean. These means refer to code lengths, which are converted into entropies by replacing 
the length of a sequence by the negative logarithm of the probability of its occurrence. Only affine 
and exponential generating functions of means preserve the property of additivity and invariance 
under translations, and hence are Kolmogorov-Nagumo functions, resulting in the Shannon and 
Renyi entropies, respectively. Pseudo-additive entropies are generating functions of means of order 
< T < 1, which is the exponential of the Renyi entropy, or in the t = Q limit, the Shannon entropy. 
Means of any order cannot be expressed as escort averages because such averages contradict the 
fact that the means are monotonically increasing functions of their order. Exponential mean error 
functions of Renyi, in general, and Shannon, in particular, are shown to be measures of the extent 
of a distribution. 
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ENTROPY AND CODING THEORY 

In his seminal paper on entropy and information, Renyi 
laid down the fundamental properties of entropy and 
its relation to a measm^e of information. Many of these 
tenets have been all but abandoned 0. 

Basing himself on the relation between the mean code 
length, 



N 



(1) 



and the Shannon entropy, Renyi argued convincingly that 
any putative candidate for an entropy should be a mean. 
Here, p = (p(xi),p(x2), . . . ,p{xj^)) is the set of probabili- 
ties of N input symbols, x = (xi, X2, ■ ■ ■ ^ x^), that are to 
be encoded using an alphabet of size D. Each p{xi) > 0, 
and the distribution is complete, X^iLi Pi^i) = 1- The 
Xi represent a sequence of rii characters taken from the 
alphabet. 

There exists a uniquely decipherable code with lengths 
rij iff 



N 



J2 ^ 1' 



(2) 



which is known as Kraft's inequality jSj. The equality is 
automatically guaranteed by setting 



logDP(a;j), 



(3) 



which represents the amount of information received by 
knowing that an event of probability p{xi) has occurred. 
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Introducing Q into the weighted arithmetic average 
gives Shannon's expression for the entropy, 



N 



D Pii 



(4) 



where the abbreviation pi — p{xi) has been introduced. 
Renyi addressed the question of what other entropies are 
obtainable when the weighted arithmetic mean, Q), is 
replaced by a generalized mean 



N 



^(") = <^-'|^P.0(nO| =M4n), (5) 

where 0(x) is a strictly monotonic and continuous func- 
tion that possesses an inverse 4>~^{x). 

The generating function (j) must be so chosen that the 
generalized mean (O possess the following properties. 
First and foremost, it must have the property of additiv- 
ity. If g = (g(a;i), q{x2), ■ ■ ■ , q{xN)) represents another 
finite discrete probability distribution then the entropy 
of their direct product should be additive 



Sip(E>q)^Sip) + S{q)^ 



(6) 



This is guaranteed by the fact that the events are in- 
dependent so that their probabilities multiply, and the 
entropy satisfies the functional equation, ©. Accord- 
ing to Q multiplicative probabilities means that code 
lengths are additive, as they should be. 

Secondly, the entropy should possess the mean-value 
property, which says that the entropy of the union of 
two distributions is the weighted arithmetic mean of the 
individual entropies. 

Other properties listed by Renyi that an entropy 
should have were symmetry, continuity, and normaliza- 
tion. 
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EQUIVALENT MEANS 



PSEUDO-ADDITIVITY 



In addition to these properties, any classical expression 
for the entropy should be translation invariant, which 
says that only entropy differences are measurable. For 
N — 2, the generalized mean must satisfy 



^ [pi(t>{xi + a) + p2(f){x2 + a)] 

= C/)^^ [pi(t>{xi) + P2(I){X2)] + a, 



(7) 



where a is a constant. Such functions, (j), were first in- 
vestigated by Kolmogorov and Nagumo 0, and will 
be referred to as KN functions. The only known strictly 
monotonic increasing solutions of the functional equation 
are Jj 



affine, and 



(j){x) = ax + b 



(j){x) = aD'''' + b, 



exponential functions, where a ^ and r ^ 0. 

The affine solution leads immediately to the Shannon 
entropy I^J under condition whereas the exponential 
solution leads one to consider exponential mean lengths 



(8) 



LeH^^log^^Ki?™- 
1=1 

whose corresponding entropies are 

5fl(p) = -log^^p^- (9) 



i=l 



under condition 

It is imperative to emphasize that Renyi worked with 
the code lengths as the independent variables, and not 
with the number of different sequences of length Ui : 



(10) 



For if he had considered (|10|l as the set of independent 
variables he would have obtained means of order r, 

l/r 



leading to the expression 



l/r 



(11) 



as the entropy, rather than as the 'exponential entropy' 
Such an entropy would not possess the property of 
additivity that its logarithm would restore. 



The foregoing discussions provides a basis for under- 
standing the property of 'pseudo-additivity' that certain 
entropies have been found to possess 0, ■ 

Consider the function 

= (12) 

T 

which becomes affine in the limit r — > 0. Since H12|) is 
a KN function, its mean is equivalent to (jS)), and, con- 
sequently, to the Renyi entropy, ©, when condition Q 
is imposed. However, were we to introduce ^ directly 
into (O, 



<f>il/p^) 



1 



(13) 



and then take its mean value, we would obtain an expo- 
nential entropy. 




P,:0(1/Pj 



9}t^(l/p) 



rather than the Renyi entropy itself. 
The weighted arithmetic mean of 1)13(1 . 



N 

Shc (p) = ^ Pz(l){^/Pi) 
1=1 



Ell pr 1 



(14) 



is known as the Havrda-Charvat T^-Daroczy T^-Tsallis 
(HCDT) entropy The HCDT entropy, is pseudo- 
additive in that it is the solution of 

Shc {p^q) = Sip) + S (g) + rS {p) S (q) , (15) 

and not of ®. The KN funct ion, whose mean is equiva- 
lent to the mean of 1(13(1 . 

(i/p) = {i/p) , 

is 

4>il/p^)^T^{^/p^) + ^. 

Its weighted arithmetic mean, 

Shc [p) = tShc (p) + 1, 
satisfies the functional equation 

SHc[p®q)^SHc[p)SHc[q). (16) 

implying a power law solution 0, p. 39], and the com- 
plete loss of additivity. Hence, on the basis of equivalent 
means we can transform pseudo-additivity, 1(15(1 . into a 
multiplicative relation, l(l()(l . showing that neither rela- 
tion has any thermodynamic meaning regarding the lack 
of extensivity. 

As ((11(1 clearly shows, it is the logarithm of the mean 
of the HCDT entropy that has physical meaning, and 
this is the Renyi entropy. If we had insisted on work- 
ing with code lengths, and not with their probabilities, 
we would have obtained the Renyi entropy directly from 
introducing (j2Jl into the mean code length ©. 
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EXPONENTIAL MEAN ENTROPY BOUNDS ON 
MEAN NUMBER OF SEQUENCES 

According to Jensen's inequality for a convex function 



JV 

E 



(17) 



Taking logarithms and considering t > 0, there results 

N 



/ N \ ^ 

\i=l / i=l 



(18) 



asserting that the exponential mean of parameter r > 
is never inferior to the weighted arithmetic mean. This is 
most easily shown for small r. Expanding the exponent 
and then the logarithm in powers of r, we get to lowest 
order llal 



1 ^ 

T 

1=1 




The term is the curly brackets is always positive since it 
is the variance of Ui . We must also restrict r < 1 in order 
to ensure that the Renyi entropy is concave. 

On the strength of inequality (|18|l shows that the 
Renyi entropy is bounded from below by the Shannon 
entropy, and since the entropy is maximum for a uniform 
distribution, we have the following hierarchy: 

Ss < Sr < Sh, 

where Sh = log^i N is the Hartley entropy. 

In terms of the number of different sequences of length 
Ui, Jensen's inequality (|17|) becomes 



Mr (to) 




OTo(^), 



which says that means of order r > can never be inferior 
to the geometric mean |l6l | , the weighted arithmetic mean 
(r = 1) being a particular case. The mean inequality for 
the same order and different argument 0, p. 14], 



971,(60) > 971, (1/p), 



(19) 



follows from the fact that according to the Kraft inequal- 
ity oji > pY^ for all i. The equality in H19() holds when (PJ 
is satisfied. The mean number of sequences is bounded 
from below by the exponential of the Renyi entropy. As 
a problem in majorization, we say that uj majorizes p~^, 

LU >- p~^. 



A similar, but not identical, result was found by Camp- 
bell |i[i3, who used Holder's inequality !i p. 24] 

971,/„(lo)971i (1/^p) > Mr (l/p) , 

where a = 1 — r, and the Kraft inequality, to obtain 



anr/a(w) >9Kr(l/p). 



(20) 



The condition for the equality in (|20|l is not but, 
rather. 



P?/ 



N 



i=l 



(21) 



which has been referred to as an 'escort probability 
Whereas (|19|) implies (|20() . the converse is not true. In 
other words, inequality H2U|I holds for a = 1, as (|19|) 
clearly shows, so that (I21|l must also hold for a = 1, 
which is Q. In other words, (|3Jl is sharper than H21|l . 
Moreover, since inequality (|20|l holds for a < 1, the same 
must be true in (I21|l . This condition has apparently gone 
unappreciated ^19] . 

'ESCORT' AVERAGES 

Escort averaging has been used in variational formu- 
lations that maximize the pseudo-additive entropy l|14|) 
with respect to escort expectations of thermodynamic 
constraints Pragmatically speaking, it leads to an- 
alytic expressions for the variational equations, which 
would otherwise not exist. If escort averaging has any 
meaning at all, it must yield viable expressions for the 
means, and functions of the means. 

Mean entropies are special cases of generalized means, 
where the variables, — logpi, and their weights, p^, are 
not independent. Rather than considering means of order 
T, for which a demonstration that the mean is a mono- 
tonically increasing function of its order is given in [iSl] . 
we shall consider the exponential entropy l|ll|) and the 
Renyi entropy Q- 

Differentiating 

N 



4=1 



with respect to r yields 



logu S 



dr 



N 



Y,pI~^ logo P^. (22) 



At a stationary point, dlog^, S/dr — 0, and 



N 



N 



Sr = log^ S=-J2p? logD P^/J2P^' 



(23) 
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where again a = 1 — t. Asa— >1 the entropy H23() 
transforms into the Shannon entropy, Expression 
(|23|l states that the logarithm of the exponential entropy 
which is the Renyi entropy, is the escort average 
of — \ogjj Pi. The second derivative of 1)22(1 . evaluated at 
the stationary point is 



Y.i=i (logD Pi? ( Eili p? logi? P 



N 



Eti pf 



>o, 



because the right-hand side is the variance of — log^, pi 
under escort averaging. Hence, log^i S/ dr^ > for r > 
0, and (P \og]j S/dr^ < for r < 0, implying that there 
are two extrema: a local maximum for r < and a local 
minimum for r > 0. This requires dlogj^S/dr < at 
T = 0, where the equality sign applies to the degenerate 
case where the extremes coincide in a point of inflection 
at r = 0. 

Now, if it can be shown that d log^ S/dr > Q &t t — 
0, then \ogjj S has no extrema as a function of t, and, 
is, in fact, a monotonically increasing function of r for 
all values of r. From this we will conclude that log^ S 
cannot be expressed as an escort average derived 
from a stationary condition, since such a condition does 
not exist. 

Writing ((22|l in the form 



S 



rd logpS 
dr 




log'nK + S'^ \ogjj S 



it is apparent that the ratio on the right-hand side is of 
the form 0/0 as r because log^, 5* Ss in that 
limit. With the aid of L'Hopital's rule we get 



2 hm 



S 



rd logp S 

dr 



N 



= J2p^ (.^ognP^f-Sl>0. 



The inequality follows from the fact that the right-hand 
side is the variance of — log^jp^. Hence, dlog^, S/dr > 
as r ^ 0, and so it is positive for all r. This implies 
that the logarithm of the exponential entropy ((ll|l is an 
increasing function of r, and that no stationary point 
given by condition (|23l) exists. 

This is easily confirmed from the expression for the 
Renyi entropy. Differentiating with respect to r gives 



dSn 
dr 



Sr 



EN 1-r 1 



E^=l Pi 



The stationary condition is again given by the escort av- 
erage, (jSSl- The product Td^Sn/dT'^ > at the station- 
ary point so that we have a local minimum for t > and 
a local maximum for r < 0. This means that the curve 
of Sr versus r has a negative slope as it passes through 



T = 0. The demonstration that Sr has no extrema when 
considered as a function of r follows exactly as before. 

Hence, the Renyi entropy, or for that matter any mean 
(isf . cannot be expressed as an escort average because 
that would violate the condition that the mean is an in- 
creasing function of t. It is this property, in fact, which 
guarantees that the arithmetic mean (t = 1) > geometric 
mean {t — Q) > harmonic mean (r = — !)■ 



EXTENT OF A DISTRIBUTION 

In the next to the last section we have found that the 
exponential of the Renyi entropy is the lower bound on 
the mean value of the number of different sequences of or- 
der T, whose equivalent mean was the mean of the HCDT 
entropy. Exponential mean entropies have been shown to 
be measures of the extent of a distribution • 

The measure of the extent of a distribution is inher- 
ently related to the error that is committed by using 
an estimated probability distribution , q, when the 'true' 
probability distribution is p. Renyi |20j has referred to 
this as 'information gain ', while Kullback 12111 used the 
term 'directed divergence', being based on the Shannon 
inequality 



N 



£s{q\p) = X! P^^°?>L 



> 0. 



(24) 



The quantity — X^ili Pi log_D Qi is referred to as the in- 
accuracy |23| . Inequality ((24(1 is easily seen to be a con- 
sequence of the arithmetic-geometric mean inequality: 



N 



N 



1=1 



(25) 



As a generalization of ((24(1 we may consider the gener- 
ating function 



(l>{qilPi) = 



{q^/p^r - 1 



(26) 



since the negative of its weighted arithmetic average in 
the limit as r ^ is 



N 



lim V Pi(j){qi/pi) = £s{q\p)- 

r^O ^ — ^ 



The mean of ((26(1 in the same limit is 



N 



lim 1^ P^Hq^/P^)j = i^-^^^^l''^ < 1, 

which we will see to be related to the extent of a distri- 
bution. 
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The arithmetic average of H26|l , 



N 



EN l-r T 1 
i=i P,; QI - 1 



has been referred to as the error incurred when the dis- 
tribution q is used instead of the 'true' distribution, p 
[H P- 208]. The mean value of H2()|l has the equivalent 
mean 

/ N \ 

m^{q/p) - Tlr{q/p) = [J2 Pl^'^n ^ 1' (27) 

since 4){x) is a linear function of x"^ for t ^ 0, p. 
68]. The inequality in H27|) follows directly from Holder's 
inequality, 



AT 



N 



1-T 



N 



1=1 



for < r < 1. The second inequality makes allowance 
for incomplete distributions. 

The error function with a parameter r. 



£R{q\p) = -\logj, p]-^qj^ 



> 0, (28) 



is constructed in analogy to the Renyi entropy jlli p. 
208]. In the r = limit, (|28|) reduces to the Shannon 
error function H24|) . 

Because of the inequality of the means '3, p. 26] 

Mrix) < 971. (x), 

for r < s, with equality iff the probability distribution is 
uniform, the mean 



(29) 



provides a measure of the extent of a distribution. The 
mean (|29|l is homogeneous in q, and satisfies 



N 



JJ-£s(q\p) < J)~£R(q\p) < (30) 

i=l 

The lower limit corresponds to the geometric mean 

N 



Pi 



i=l 



and is least affected by variations in q/p 1^, while the 
upper limit 



N 



constitutes the 'range' of the distribution. The range is 
most elementary measure of the extent of a distribution. 

The difference between an ordinary and an incomplete 
random variable is that the latter is not defined at ev- 
ery point in the sample space [20I p. 570]. Points where 
the random variable are undefined are said to be unob- 
servable, and X^iLi < 1 is the probability that the 
outcome will be observable. 

The arithmetic-geometric mean inequality H25|l can be 
written as 



N 



in gi-Pi logo Pi) 



(31) 



which is the first and second inequalities in H30() . Shan- 
non's inequality H24(l guarantees that the right-hand side 
of H31|l is less than unity. The equality sign holds in 

(|31|l when pi — Xqi, where A = {j2iLi Qi^ : thus en- 
suring that p is a complete distribution. This condi- 
tion is obtained by requiring that — X^ili Pi log_D Qi be 
a minimum subject to J2iLi — const. In fact, an es- 
sential part of the coding theorem for a noiseless chan- 
nel is to show that the minimum of —J2^=i Pi log is 
— Pi^ogjjPi subject to the constraint J2iLi 9* = 1 

pp. 17-20]. The latter constraint implies A = 1, and 
the Shannon error function vanishes leading to an equal- 
ity in (jSU . 

However, there is another way the equality can be 
satisfied in (|31|l : The minimum of X^iLi Qi subject to 
the constraint J^iLi Pi^ogj^qi ~ const is _D^^s(9Ip) ^1^. 
Introducing the Lagrange multiplier A = qi/pi, where 
A = 9*1 i^to the Shannon error fmiction (|24|l re- 

sults in 



N 



£s{q\p) ^\0go l/E 



(32) 



An incomplete distribution X^iLi < 1 leads to a finite 
error, £s > Q- The argument in the logarithm of H32|l 
can be interpreted as the number of digits necessary to 
specify the set of observable events. The smaller the set, 
the greater the number of digits that will be required. 
In this respect, (|32|l is the antithesis of Boltzmann's 

principle, where (^Yl!i=i qij is paired to the 'thermody- 
namic probability', and the Shannon error function l|24|) 
to the entropy. Whereas Boltzmann's principle asserts 
that the greatest number of complexions, that correspond 
to a single macroscopic state, possesses the greatest en- 
tropy, H32() affirms that greatest number of digits needed 

to specify a given set, ^inf X^iLi ft) 1 corresponds to 
the greatest error in discriminating between two proba- 
bility distributions. In other words, Boltzmann's princi- 
ple is a measure of attenuation, whereas H32|l is a measure 
of accentuation. 
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This is related to the problem of "how to keep the 
forecaster honest" [i^ . A forecaster uses an estimated 
probability distribution q to determine the outcome of 
events whose true distribution is p. His fee for correct 
prediction f{q) is to be paid after it is known that the 
event has occurred. His expected fee is 
which, if he is honest, must satisfy 

N N 
i=l i=l 

The Shannon error function H24() identifies / with the 
logarithm. Then inf X]i=i 9« i'^ 132|) is that state with 
the lowest degree of predictability. 

If q is the uniform distribution then the error H28(l re- 
duces to the differences in entropies so that the exponen- 
tial of this difference is equal to the mean: 

mr {l/Np) = i:ii^«(p)-s^j(A')^ (33) 

The closer p is to the uniform distribution, the larger will 
be the mean value H33() . Interpreting as a measure 
of extent, small values of 9JIt imply that the probability 
measure is concentrated on a set of small g-measure for 
< T < 1. 

As (|33|l shows, the difference between the Hartley and 
Renyi entropies is an exponential measure of the magni- 
tude of 'tXflr , and hence to the extent of the distribution. 
The fact that a small value of SJIt- implies a small prob- 
ability measure requires r to be confined to the interval 
(0, 1] 01 is thus related to the condition of concavity of 
the Renyi entropy. Only for t S (0, 1] will the exponent 
in H33|) be a true difference in entropies. 
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